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ABSTRACT 

A visual segmentation mechanism for a connectionist pattern recognition system is 
sought. However, to find such a device requires the solution of the dynamic binding prob- 
lem. Visual segmentation could be learned by a dynamic binding network. Several puta- 
tive dynamic binding mechanisms are discussed but each is found to have weaknesses. 
Two mechanisms are being studied in greater depth so that their weaknesses may be re- 
solved. Also a micro-world for the simulation of visual segmentation tasks is described. 

1. INTRODUCTION 

Simpson et al (1992) have shown that an artificial neural network (ANN) can discriminate between pre- 
processed images of Ceratium arcticum (Ebrenberg) and Ceratium longipes (Bailey), two dinoflagellate 
plankton species. The input patterns on which the network was trained were outline drawings of plankton 
specimens taken from photomicrographs or camera hicida images. Each outline drawing was digitised 
and the frequency histogram of the image's power spectrum was determined by a Fast Fourier Transform 
(FFT). The frequency gradient of the lowest 16 frequency bins of the histogram comprised the input to 
the network. 

However, the ANN plankton classifier is not able to classify plankton specimens contained in images 
where more than one specimen is present (see Figure I ) or where there are also large items of debris such 
as fragmants of broken plankton or air bubbles. As a result the network is trained and tested with images 
of single plankton specimens with no large items of clutter. This approach, therefore, fails to address an 
important challenge for machine vision. This challenge is to enable artificial vision systems to deal with 
visual images which contain more than one object In contrast to the ANN plankton classifier, human vi- 
sion segments an image into its constituent objects. 

The objective of this study is to improve the robustness of the ANN plankton classifier by incorporating 
a mechanism which enables recognition of an object separate from the recognition of other objects con- 
tained within an image. This mechanism must be able to segment an image into its constituent objects, 
then generate a representation in which the information relating a single object is grouped together and 
kept separate from information about other objects. Psychological and connectionist theory will inform 
the possible mechanisms considered. In particular a general purpose segmentation mechanism is sought, 
not one that is only able to segment images of plankton. An exciting possibility is a mechanism which 
learns for itself how to segment images from a particular domain. 

Firstly this report will comment on the psychology of visual segmentation, in which failures of the process 
are thought to yeild insights into how human brains perform segmentation. Next, a link between seg- 
mentation and connectionism's dynamic binding problem is established and some putative binding mech- 
anisms discussed. Then the progress towards finding an appropriate segmentation mechanism is con- 
sidered and the future directions of this research are identified. 
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Figure 1 . A photomicrograph of two Ceratium arcticura and debris. <** 

2. VISUAL SEGMENTATION S> 

CD 

r- 

Segmentation not only occurs in vision but is fundamental to all perceptual processes. For instance during JTJ 
auditory language perception sound is grouped spatially to the location of its source and is grouped tern- 
porally into phonemes and words, Treisman & Galade (1980) describe segmentation as "the process of - 
grouping of information over the spatial extent of an object .. . so that features belonging to one object are 
not confused with those of another object"(p 97), Unfortunately this definition only considers those ob- 
jects which extend spatially. It ignores the possibility of objects, such as sounds, which extend over time. 
Therefore, Treisman & Galade have only defined visio-spatial segmentation. Also it is important to dis- 
tinguish between two possible uses of the torn object Objecti refers to an object as it exists in the physical 
world, whereas object2 refers to the mental representation a perceiver has of an objecti. Treisman & Ga- 
lade's definition uses object in the sense of object [ . In this report the term object should be taken to mean 
objecti and the termperceptual object should be understood as object2- Also for the purposes of this report 
the components of a representational object shall be referred to as elements or representational elements. 

Perceptual processes are so reliant on segmentation that it is difficult for us to imagine what unsegmented 
experiences would be like. Only under very unusual circumstances do errors of segmentation occur. One 
example which gives an impression of what unsegmented perception might be like occurs when one looks 
at a television screen from close up. At a distance of a few centimetres the picture cannot be discerned. 
All that can be seen is a mosaic of red,blue and green dots. At this distance from the television our visual 
knowledge is insufficient to group these coloured elements together to represent the objects displayed on 
screen. The reason why, is that this is a visual experience very different from that which we normally en- 
counter Also as one moves back from the television screen there comes a point were one is able to group 
together the dots of coloured light and segment the image into objects. This point is where the image be- 
comes enough like the normal visual world for our visual knowledge to enable segmentation. 

Segmentation and object recognition are intimately related, in fact it appears that in order to work both 
processes require information from each other. It is possible that the processes of segmentation and object 
recognition occur in parallel. Obviously, an object can only be recognised once it has been isolated from 
an image. But also accurate segmentation relies on knowledge about the internal structure of objects. The 
too close to the TV example, mentioned earlier; demonstrates that in situations were a person's visual 
knowledge is limited by lack of experience they are unable to segment effectively. However,we are able 
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toscopically presented with a visual array of coloured letters and asked to respond if a certain letter and 
colour combination is present in the stimulus. It is found that subjects will frequently respond when the 
colour and the letter are both present in the array but are separate. Theories of visual segmentation must 
account for the occurrence of false conjunctions. 

The search for a segmentation mechanism which can be instantiated in a connectionist network is a facet 
of the dynamic binding problem. There now follows a discussion of dynamic binding and the types of 
connectionist representations which have been proposed as solutions to the problem. 

3. THE DYNAMIC BINDING PROBLEM 

Dynamic binding is the representation of objects by the temporary conjunction of two or more representa- 
tional elements, it is required by any cognitive process that exhibits systematicity and compositionality 
(Shastri & Aajanagaade, in press), processes which include the production and comprehension of lan- 
guage, logical inference and visual segmentation. For example, Treisman and Galade's definition of vis- 
ual segmentation includes the term "the grouping of information". This term is equivalent to binding. But 
connectionist theory will be unable to account for systematic and compositional cognitive processes un- 
less a mechanism which can perform dynamic binding in ANNs can be found. However, none of the bind- 
ing mechanisms devised to date are fully satisfactory. There shortly follows a discussion of these putative 
binding mechanisms which will highlight each mechanism's strengths and weaknesses. These binding 
mechanisms fall into four categories: associationist representations, enumerated representations, phase 
synchrony in oscillatory networks and recursive distributed representations (RDR). 

3. 1. Associationist Representations 

Associationist representations are the simplest of connectionist representations. They identify those fea- 
tures which are present but not how the features are grouped into objects. Therefore, they are unable to 
simultaneously represent more than one object It is this weakness of associationist representations which 
brought attention to the dynamic binding problem for connectionism. To see how this type of representa- 
tion fails to describe two objects at once the following test will be applied. The test is to represent the 
simplest possible two object description. Each object is the conjunction of two representational elements: 
a colour and a geometric shape. There are two possible colours, i.e. red or blue, and two possible shapes, 
i.e. square or triangle. If a putative dynamic binding mechanism is able to represent a red triangle and a 
blue square then it is worthy of consideration. An associationist representation of red triangle and blue 
square would have active units standing for the elements red, blue, square, and triangle. However there 
is no difference between the associationist representation for red triangle and blue square and the repre- 
sentation for blue triangle and red square. Therefore, associationist representations are unable to perform 
binding. 

3* 2. Enumerated Representations 

Enumeration is the most widely used binding technique, it is a representation in which there are units 
standing for each of the possible conjunctions of representational elements. For example, an enumerated 
representation which allows the representation of red square and blue triangle would have aunit for each 
pairing of colour and shape Le. red square, blue square, red triangle and red square. This is a locaUst enu- 
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Another major weakness of enumerated representation is that the representational elements are unbound. 
For instance in an enumerated representation of red triangle and red square there is nothing to indicate 
that the red in red square is the same as the red in red triangle. This is the equivalence problem of enumer- 
ated representations. Enumeration does not dynamically bind representational elements together at all. 
These elements are not dynamically bound they are permanently linked. 

3. 3. Phase Synchrony in Oscillatory Networks 

In this technique representation of binding is achieved by allocating bound elements, to the same phase 
in an oscillatory network. Oscillatory networks have units whose activation fluctuates over time. Binding 
is represented using the phase of the waveform of this oscillation of activation. The elements of a represen- 
tational object all have the same phase. 

Support for phase synchrony binding comes from neuropsychological evidence which shows that the ac- 
tivity of non-adjacent hypercolumns in a cat's visual cortex will become phase locked when perceiving 
different parts of the same moving object (Gray et aU 1989). Also it is claimed that limits on the number 
of possible discrete phases an oscillating system can maintain may explain some constraints on the cogni- 
tive abilities of humans, it is argued that accidental synchrony between units can account for false conjunc- 
tions (Hummel & Biedennan, 1 990) and for the 7+2 constraint on Short-term memory (Shastri & Ajjana- 
gadde, in press). 

Most oscillatory networks have complicated mechanisms for producing the oscillation of activation and 
for establishing phase synchrony. For instance, Hummel & Biederman's ( 1 990) networic co-ordinates the 
phases of units by activating fast enabling links. These are links which operate on a time-scale several 
times faster than that of ordinary connections. Many oscillatory networks have activation waveforms that 
are convoluted and even chaotic at times( e.g. Horn, Sagi & Usher, 1991). The complexity of these net- 
works and their behaviour are barriers to our understanding of phase synchrony as a binding mechanism. 

In contrast to these complicated networks, the oscillatory activation of Mozer, Zemel & BehrmaraTs 
(1992) MAGIC is implicit In this network, activation is represented by complex numbers in polar form. 
The amplitude of activation represents confidence in the presence of an element The phase of activation 
represents binding between elements. MAGIC has been trained to segment images of a micro-world com- 
prised of simple geometric shapes. Representations of these shapes are constructed using four types of 
feature: lines at 0°, 45°, 90° and 1 35° to the vertical. There is a unit for each possible conjunction of feature 
and location which are grouped as feature maps. When an input pattern is first presented, the phase of each 
unit is random. The network then relaxes to a stable pattern of activation in which elements from the same 
object have the same phase. Also the network has a learning rule which is able to find a set of complex 
valued weights that will perform this segmentation. But despite the success of MAGIC, phase synchrony 
is a poor dynamic binding mechanism because it is unable to represent shared features. 

Existing oscillatory networks do not allow an element to be part of two objects at the same time because 
the unit representing such an element would need to have two phases, which is not possible. For instance 
to represent red circle and red square* the element for red must be bound to element for square whose 
unit has one phase and be bound to the element for circle whose unit has a different phase. Therefore the 
red unit requires two phases. A new kind of oscillatory network would be needed to overcome this prob- 
lem of representing shared elements. 
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" 3. 4, Recursive Distributed Representations 

The failure of all of the above-mentioned putative dynamic binding mechanisms is possibly founded on 
their use of a restricted compositional method, concatenative compositioncdity (van Gelder, 1990). This 
form of compositiality is the construction of structured representations from a set of tokens ( i.e. represen- 
tational elements) by linking or ordering them so that these tokens are unchanged. For example in preposi- 
tional logic the statement (p&q) retains the tokens p and q. However, van Gelder also argues that preserva- 
tion of tokens is not necessary for compositionality. There is also Junctional compositionality in which 
complex expressions do not contain tokens of their constituents, although these constituents are retrie- 
vable. 

Pollack's (1989) Recursive Auto-Associative Memory is able to form representations which appear to 
possess functional compositionality. This type of representation is called a recursive distributed represen- 
tation (RDR). RDRs, as their name suggests, are produced in a distributed network by a recursive process. 
Over several serial steps a global representation of a group of elements is produced by adding a further 
element to the global representation produced by the last step. An RAAM would have no trouble in repre- 
senting red triangle and blue square. Up until now RDR had not been proposed as dynamic binding mech- 
anism. 

However there is a weakness of the RDR approach to visual segmentation, it is very unlikely that visual 
segmentation is a serial process but RDRs are composed and decomposed serially, it takes us only a frac- 
tion of a second to make sense of a new visual image. The speed of this process strongly suggests that 
the brain computes visual segmentation in parallel However, the representation developed by RAAMs 
on their hidden units is able to encode binding of elements despite the serial process by which the hidden 
activation representation were formed. Therefore, understanding how objects are represented within 
RDRs may help us to solve the dynamic binding problem. 

4. PROGRESS 

The progress made on this project has been divided between three areas. The first of these areas has been 
to design a micro-world which approximates to real-world visual segmentation. Another area of progress 
has been the development of Polarnet: an complex domain oscillatory network. The final area of progress 
was a preliminary study of a RAAM's performance when trained with patterns taken from the micro- 
world. These areas of progress and their future development will be discussed in turn. 

4. 1* Micro-world 

The segmentation of real-world images is a complex and computationally intensive process which is far 
beyond the capability of networks comprised of only a few tens of units. Therefore, a micro-world has 
been developed images from which are much easier to segment than real-world images, it is intended that 
binding networks be trained and tested with micro-world images to assess their viability. A micro-world 
is a simulated universe which is defined by a few simple rules. These rules, which can be thought of as 
the micro-world's laws of nature^ should reflect those characteristics of the real-world which may influ- 



11/02/04 TLE 13:14 FAX 7033085397 



USPTO AU-2415 



@010 




(c) (d) 

Figure 2. Four Micro-world Plankton at Different Rotations 



4. 1* 2. Boundary Problems 

A micro-world of a visual domain must define how objects are represented when some of the objects fea- 
tures are located over the micro-world's horizons. One possible interpretation is to disallow those objects 
which cross a micro-world horizon in this way. However this interpretation causes problems. As a result 
certain feature - location conjunctions are more common than others. This situation is definitely not true 
ofhuman vision. Also some of these conjunctions are more frequently associated with a particular species 
of micro-world plankton which becomes the basis by which plankton species are categorized. This is 
quite unlike real-world categorisation. A more realistic categorisation task is based on the recognition 
of a spatial arrangement of features within an object. This sort of categorisation will occur in the absence 
of simpler cues to recognition. To ensure the even distribution of feature to location conjunctions those 
features of objects which disappear over the microworld horizon will reappear over the opposite horizon 
(See Figure 3). This interpretation permits 36 different plankton exemplars. 
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Figure 3. Micro-world plankton whose features cross over the domain's horizon. 
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4. 1. 3. Images of Multiple Plankton Exemplars 

Plankton-world allows the simultaneous representation of more than one plankton in an image. There are 
630 possible two object images and over 23,000 three object images. Even in this simple micro-world 
the number of combinations of objects are large. When more than one object is in an image the possibility 
of them overlapping occurs. Figure 4 shows pairs of plankton in different degrees of overlap permitted 
by plankton-world. 

4. 2. Polarnet 

Polamethas been inspired by MAGIC (Mozer era/., 1991) but there are several key differences between 
them which shall be highlighted in the following description of Polarnet. 

4, 2. 1. Architecture 

A crucial difference between Polarnet and MAGIC is that Polarnet is a multilayer feed-forward network 
whereas MAGIC is a recurrent network. Therefore it will be simpler to analyse Polarnet's performance 
than MAGIC'S. Recurrent networks develop hidden layer representations which defy interpretation. It is 
envisaged that existing techniques for understanding the representations developed by real-valued feed- 
forward networks (e.g. Hanson & Burr, 1990) will be modified for the analysis of Polarnet. 

4. 2. 2. Learning 

Polarnet, like MAGIC is trained through the back-propagation of error. However, the error measure used 
in Polarnet is unlike the error measure employed in MAGIC. Mozer et al use an error measure which 
appears to be based on the Hamming distance between the network's actual and target outputs. Polarnet's 
error measure which is shown in Equation 1 is the generalisation to the complex domain of the error 
measure which underlies the delta or Widrow-Hoff learning rule. 

E = l/2 2 j (rj 2 + s i 2 -2rjSjCOs(* j -ej)) (1) 

where E is error, t{ is the actual amplitude of unit i, s j is the target amplitude of unit i, <&; is the target phase 
of unit i and 6i is the actual phase of unit i. The learning rule is derived by partial differentiation of E with 
respect to Wjj, where Wy is the complex valued weight of the connection between unit i of the hidden layer 
and unit j of the output layer. The learning rule is the negative product of this partial differential and a real 
valued constant which controls the rate of learning. Rumelhart, Hinton and Williams (1986) have shown 
that the error between the output $nd target activation vectors can be propagated back through the network 
to amend the weights of connections in all the layers. However the derivation of the backpropagation 
learning rule must take account of a network's activation rule. 
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Figure 4. Examples of micro-world images in which there are different types 
of overlap between two Plankton; (a) is of non-overlapping objects, (b) and 
(c) are of two objects with different features at the same location, (d) and (e) 
are of two objects with the same feature at the same location, and (f) is of two 
objects with two of the same features at the same locations. 



4. 2. 3. Activation 

Polaraet's net input function is the dot product of the activation and weight vectors generalised to the com- 
plex domain, which is the same as MAGIC's input function. However, no satisfactory output function has 
been found for Polarnet as yet. The network requires an output function which maps the net input into 
a unit to a complex number in the range 0 < r t < 1 and -st < 6j :S at. Values for rj and 6j, which are outside 
of this range cannot be defined in polar-form complex numbers. The output function should also be non- 
linear and allow partial differentiation with respect to Wy. Meeting these conditions will allow the net- 
work to use a back propagation learning rule. An early version of Polarnet employed die logistic function 
shown in Equation 2 which proved to be unsatisfactory. 

r* - (l-e-KIr 1 (2) 

where q is the amplitude of unit i and ni is the net input to unit i. In Polarnet values of |nj| are always positive 
and this function maps positive values to the range 1/2 £ r x < 1 . Consequently the network fails to learn 
the input-output mappings with which it is trained. A promising alternative to the logistic function is 
shown in Equation 3 (Georgiou, 1993). 

ftnj)~n(R|n|r* (3) 
Polarnet has yet to be implemented with this activation function. 

4, 2. 4. Future Work 

Once a suitable activation function has been found simulations of Polarnet will be undertaken to study 
the network's ability to segment images taken from Plankton-world. The networks performance will be 
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Alongside the development of Polamet, investigation of a recursive distributed representation approach 
to visual segmentation of Plankton-world images has been undertaken. The input pattern representing 
an unsegmented image is divided into several sub-patterns which are presented serially to a RAAM. Each 
sub-pattern stands for the conjunction between a feature and its location. The sequential order of sub-pat- 
terns denotes the object to which the feature belongs. 

For a RAAM to map a random sequence of features to a segmented sequence of features, a transformation 
of the network's hidden representations is required. Several researchers have simulated RAAM networks 
in which such transformations are made (Chalmers, 1990; Chrisman, 1991). The transformation process 
can be facilitated as follows. A RAAM is trained to auto-associate input patterns taken from the micro- 
world. These patterns are presented as a randomly ordered series of sub-patterns from which the network 
recursively generates a global representation of the sequence. After training the network's hidden repre- 
sentation for each input pattern is found. A second RAAM network is trained to auto-associate the set 
of target output patterns in which the features are sequentially grouped into objects. This network's hidden 
layer representations of the target patterns are found after training. A third network performs the trans- 
formation in which the first RAAM's representations of the input patterns are mapped to the second 
RAAM's representations of the segmented output A single network can be constructed from some of the 
units and weights of the original networks which should map unsegmented images to segmented images. 
Chrisman (1991) has devised a variant of this technique which can be implimented in a single network, 
which he calls a dual-ported RAAM. Simulations of a dual-ported RAAM will be undertaken for com- 
parison with Polarnet. 

5. CONCLUSIONS 

Segmentation and dynamic binding are seen to be intimately related when perception is studied from a 
connectionist viewpoint. Whilst no psychologically plausible nor entirely adequate segmentation mech- 
anism is suggested by connectionist work on the dynamic binding problem, two approaches, phase syn- 
chrony in oscillatory networks and recursive distributed representations, do offer some promise. Inves- 
tigations into the viability of these two binding mechanisms will continued. 
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To: m Jane_Wolfe@bstz.com m <JaneJA/olfe@bstz.com> 
cc: 

Subject: FW: Interpretation of paper 



Jane, it appears that neither the school nor the author have copies of that 
paper any more. In this email, the author tries to fill in what was missing 
on the portions of the article that got cut off. 

Original Message 

From: Phil Culverhouse [mailto:P.Culverhouse@plymouth.ac.uk] 

Sent: Tuesday, January 25, 2005 2:23 AM 

To: Carlos Medina 

Subject: Interpretation of paper 

Dear Carlos, 

This is the best we can do. I hope it clarifies your understanding. Are you 
working on a related topic?? 

Kind regards, 

Phil 

Original Message 

From: Smith Graham D [mailto:Graham.D.Smith@northampton.Ac.Uk] 
Sent: 25 January 2005 10:16 
To: Phil Culverhouse 
Subject: RE: Hello! 



Phil 

My memory is good but not good enough to recall all the missing details. It 
looks as though 6 or so lines of text have been lost from the top of each 
page. Much of the lost text on the early pages (e.g., the missing part of 
Section two which described the phenomenon of illusory conjunctions) is not 
needed to understand the real contribution of the paper; Poiarnet (and the 
domain I suppose) . So to sum up, the paper identifies phase synchrony in 
polar-form MLPs and recursive distributed representations like those used by 
Pollack's RAAM as potential dynamic binding representations. 

The missing part of Section 3.2 appears to identify problems with enumerated 
representations of dynamic binding domains. What appears to be missing is 
discussion of the scaling problem; i.e., enumerated representations of any 
real world domain require unfeasibly large number of units. The equivalence 
problem, not previous mentioned in the literature, but related to Fodor & 
Pylyshn's compositionality, is described without any apparent omissions. 

I recall that Figure 3 was of examples of "micro-plankton" whose "arms" 
cross the domain horizon and therefore appear on the otherside of the grid. 
This "unrealistic" arrangement was necessary to ensure an even distribution 
of features to locations and thereby avoid the network solving the learning 
problem using simple feature relationships. The number of plankton exemplars 
(i.e, .36) is- given by the number of locations for the "micro-plankton" body 
(i.e., 9) multiplied by the number of possible orientations of the 
"micro-plankton" (i.e., 4). 

I don't believe that Poiarnet 's learning rule was stated in the paper (there 
are no missing equations) . I describe the polar valued version of the 
logistic function as being an unsatisfactory (for reasons I do not recall) 
activation output function. I offer an alternative function but do not 



present a derivation of the learning rule from it. I have been unable to 
#ihd Polarnet's learning rule in my notebooks. However, I recall that the 
'learning rule was derived from the activation rule and error function 
following the strategy used by Rumelhart and McLeliand as decribed in the 
POP bible and Rumelhart, Hinton and Smith (1986) . Also, anyone interested in 
polar-form NNs should look at G.M. Georgiou's work. 

The paper described work in progress however subsequently my simulations 
focused on the RAAM network and encoders rather than Polarnet which was 
side-lined. My reason for this change of tack was that I realized that there 
is no mapping that a polar-valued activation network can perform that a 
normal back-prop net cannot perform. In fact, expressing the activation and 
learning rules in polar co-ordinates is merely a redescription of the 
Cartesian form of the rules. In other words if Polarnet can learn the 
input-output mapping then so can the equivalent backprop MLP. Better then to 
perform simulations on the v/eli understood MLP than the largely unknown 
Polarnet , 

I hope this helps. 

Regards 

Graham 

Dr Graham D. Smith 
Division of Psychology 
University College Northampton 
Boughton Green Road 
NORTHAMPTON 
NN2 7AL 

Tel: + 44 (0) 1604 735500 Ext 2393 
E-mail : graham. d . smi th@nort hampton .ac.uk 



